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A number of benefits have been reported for computer-based assessments 
over traditional paper-based exams, both in terms of IT support for question 
development, reduced distribution and test administration costs, and 
automated support. Possible for the ranking. However, existing 
computerized assessment systems do not provide all kinds of questions, 


namely open questions that require writing solutions. To overcome the 
challenges of the existing, the objective of this work is to achieve an 
intelligent evaluation system (IES) responding to the problems identified, 
and which adapts to the different types of questions, especially open-ended 
questions of which the answer requires sentence writing or programming. 
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1. INTRODUCTION 

More and more scientific findings are becoming available as a result of the ongoing advancements 
in natural science, humanities, and science and technology. Modern educational ideas, concepts, and the 
utilization of contemporary educational technology are necessary in order to address the modernization, 
cosmopolitanization, and future-oriented nature of all subject matter instruction today [1]. The delivery of 
customized education would be ineffective without the constant observation and evaluation of the student, 
both for the purpose of an efficient educational assessment and to enable the system to adjust to the demands 
of the learners. It can be difficult to monitor and evaluate students' performance in a classroom setting, 
especially when it's required to be done in real time. 

Classical assessments have encountered several problems, namely: printing, monitoring, space 
management and correction. This requires the automation of the evaluation process using computerized 
assessment systems. According to the research carried out, we are only the evaluation systems are limited in 
terms of questions asked. They are based on questions of the QCM type, true or false, filling of empty fields, 
and multiple answers. But we note the absence of open-type questions, writing answers, which are of great 
importance in the learner’s theoretical and practical assessments. Attempts have been made to overcome the 
problems encountered in the classical evaluation system software engineering institute (SEI). This system 
offers two other types of evaluation: i) use the system compiler to answer the questions and ii) the second is 
interested in the questions that require writing answers in the form of sentences; the correction of this kind of 
answer has adopted the notion of semantic similarity. 

An intelligent evaluation system (IES) in the form of a website has been developed to enhance the 
automated correction of assessments and allow for the assessment of learners' learning levels. It provides a 
number of options, including the ability for students to take assessments online, prepare examinations with a 
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variety of topics, including open-ended and programming-related questions, and automatically correct 
answers based on teachers' responses, then it gives a score of the student. Regarding paper's structure, there 
are two sections; the first one contains the different methods for calculating the semantic and syntaxique 
similarity between two sentences. The second section give the general architecture of the realized system, all 
tools used in this project are mentioned in this section, finally the conclusion will be given. 


2. CONCEPT AND TYPES OF SIMILARITIES 
2.1. Introduction 

Measuring the similarity between two sentences (or short texts) consists in evaluating to what extent 
the meaning of these sentences is close [2]. This task semantic textual similarity (STS) is often used in 
several important areas of the automatic language processing (TAL), among which we can mention the 
search for information, the categorization of texts [3], the summary of text [4], and the machine translation 
[5]. When we talk about similarity we talk about classification, clustering (or clustering) to describe data 
partitioning and a cluster is then a set of data or elements with similarities. The description language of the 
objects of a database must make it possible to define the distance of this object from the others. 

In the field of artificial intelligence, similarity is one of the criteria for computer analysis of clusters 
and for data partitioning. This automatic classification step is necessary for the implementation of the 
machine learning methods. Expert software also seeks to take into account the context, according to which 
the similarity may vary [6]. The software will do a lot more relevant work as the attributes of the data will be 
useful and relevant in the context. 


2.2. Syntactic similarity 

Measuring syntactic similarity, “the syntactic word denotes in the linguistic sense a method of 
classification of languages according to the order of appearance of words in the sentence. Between words, the 
text runs in the field of data mining plays an important role [7]. 


2.2.1. Term frequency-inverse document frequency method 

Term frequency-inverse document frequency (TF-IDF) est une approche de pondération reconnue par 
un poids, et qui est souvent utilisée dans la recherche d'information et l'exploration de texte [8]. Ce poids est une 
mesure statistique numérique destinée à refléter l'importance d'un mot pour un document dans une collection ou 
un corpus [9]. Typically, the weight of TF-IDF is composed of two terms: the first calculates the normalized TF, 
which is the number of times a word appears in a document, divided by the total number of words in that document 
[10]. The second term is the IDF, calculated as the logarithm of the number of documents in the corpus divided by 
the number of documents containing the specific term. TF is defined as [11]. 
— Term frequency 

We notice: Tf (w,d) = {w' € d:w’= w} where w is a word, and d = {wj,...,Wm} is a 
document such as: 


nij 
a (1) 
where; 
Nj : the number of appearances of the word we want to calculate. 


Nk, j : the sum of all the words existing in the document by eliminating the punctuation, the spaces, the 
apostrophes. 
— Inverse document frequency 


. = |D| 

idfi 7 log l{dj:tied;}l (2) 
with, 
|D| : total number of documents in the corpus 


I{d;: t; E d;}| : number of documents where the term appears (i.e. n; j # 0) 
— Calculation of TF-IDF 
Finally, to put it all together, the total weight of TF-IDF for a token in a document is the product of 


its TF and IDF weights: 
tfidf,; = tfij x idfi (3) 
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2.2.2. Similarity cosine 

Cosine similarity is frequently used as a measure of similarity between two documents [12]. It may 
be a question of comparing the texts resulting from a corpus in an optic of classification or search of 
information (in this case, a vectorized document is constituted by the words of the request and is compared 
by measure of the cosine of the angle with vectors corresponding to all the documents present in the corpus, 
so we evaluate which ones are closest) [13]. As the angle measurement between two vectors can only be 
done with numerical values, we must imagine a way to convert the words of a document into numbers. This 
is why we rely on the results of the previous TD-IDF method which is the weight of each word in a document 
and we can consider it as a vector. The cosine similarity between two documents d1 and d2 is a measure of 
similarity. It is a question of calculating the cosine of the angle between the vector representations of the 
documents to compare [14]. 


Similarity is simegsinus (d4, dz) E [0,1] 


. di.d2 
siMeosinus (d1, d2) = [aaa (4) 


2.3. Semantic similarity 

Semantic similarity is a concept in which a set of documents or terms are given a metric based on 
the similarity of their meaning/semantic content [15]. Concretely, this can be done by defining a topological 
similarity, for example, by using ontologies to define a distance between words, or by defining a statistical 
similarity, for example by using a vector space model to correlate terms and conditions [16]. Contexts from 
an appropriate body of text (co-occurrence) [17]. Text similarity is a field of research whereby two terms or 
expressions are assigned a score based on the likeness of their meaning. Kocon and Maziarz [18] short text 
similarity measures have an important role in many applications such as word sense disambiguation, 
synonymy detection, spell checking, thesauri generation, machine translation, information retrieval, and 
question answering [19]. 


2.3.1. Measure of Wu & Palmer [20] 

In a domain of concepts, similarity is defined with respect to the distance between two concepts in 
the hierarchy and their position relative to the root. This similarity also takes into account the length of the 
original path ci and the extremity c; but also the depth of their most specific common subsuming, i.e. the 
length of the original path and cp and the extremity [21]. The similarity between C1 and C2 is: 


2*N3 


ConSim(C1, C2) = NitN242«N3 


(5) 
with, N1 is the distance between the concept C1 and the concept C3; N2 is the distance between the C2 
concept and the C3 concept; N3 is the distance between the concept C3 and the root. 

This measure has the advantage of being simple to implement and of having good performance than the 
other similarity measures [22]. The Figure 1 describe the relationships between the conceptual C1, C2, C3, and root. 


— 
| 
N3 | 
Li 
N1 a FAN 
Z 3 
P; Lo 


Figure 1. Conceptual relationships [20] 


2.3.2. Similarity of Mihalcia [23] 

Simple lexical correspondence is described in [23]. The word-for-word similarity measures and a 
word specificity measure are used to estimate the semantic similarity of the sentence pairs [24]. The 
following notation function was used [25]: 
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1 Dwe(T,}(maxSim(w,T2)+idf(w)) | XwefțT2}(maxSim(w,T1)*idf(w)) 


Sim(T,, T2) = = ( Swerrgiditw) Ewetrzjidf(w) 


) (6) 


2 


where maxSim (w,T) is the maximum score between the word w and the words in T according to a word- 
for-word similarity measure, and idf (w) is the inverse document frequency of the word. A threshold of 0.5 
was used for classification: a score above the threshold was classified as paraphrasing other than paraphrase. 

According to Mihalcea et al. [23], takes into account the syntactic nature of terms and restricts 
comparisons of similarities to terms of the same syntactic nature: verbs, nouns, and adjectives between them. 
He tested several types of restrictions more or less binding related to the syntactic nature of the terms: from 
nouns/ nouns, adjectives/adjectives, and verbs/verbs to only proper nouns/proper nouns. In spite of what 
could be predicted by the similarity tests between terms according to their syntactic natures, these different 
tests all led to a very marked deterioration of the results. One might think, for example, that restricting a 
named entity to being comparable only to another named entity cannot damage the results, but experience has 
shown that this discrimination leads to a bad similarity between expressions such as “the Japanese 
president... ‘And’ in Japan, the president...” [26]. The system whose results are given in the evaluations thus 
operates without any restriction as to the syntactic nature of the terms compared [27]. Below is an example of 
calculating the similarity between two sentences. Table 1 shows the similarity matrix of Wu and Palmer [20]. 
P, et P, two sentences such as: i) P,: eventually, a huge cyclone hit the entrance of my house and ii) P3: 
finally, a massive hurricane attacked my home. 


Table 1. Similarity matrix of Wu and Palmer [20] between two sentences 


Cyclone/NN Hit/VBD Entrance/NN House/NN 
Hurricane/NN 0.9565 - 0.2857 0.3158 
Attacked/VBD - 0.8571 - - 
Home/NN 0.3529 - 0.6667 1.0000 


1 [0.9565 «log (2) + 0.857 = log (2) +1log È) 


Sim(P,, P2) = 5 A 2) + 
4 «log @) í | 


From these results, we find that the two sentences P, and P, are similar. 


3. DESCRIPTION OF THE REALIZED EVALUATION SYSTEM 
3.1. Architecture of IES 
The IES developed is a website for assessing learner’s level of learning. It offers several 
opportunities, namely: the online passage of assessment tests by students, the preparation of exams that 
contain all kinds of questions including programming questions and open questions and automatic correction 
based on the answers provided by teachers. IES is an evaluation system developed to automate the process of 
assessing learners’ competencies, with several parties communicating with each other. Figure 2 shows the 
architecture of IES. The IES system has a set of components that are: 
— Teacher space: this space allows teachers to register and authenticate in the platform to build an exam 
(questions, answers, and notation) and to enter the codes of students who are allowed to take the exam. 
— Database: the database contains the information of the professors and students, as well as the exams 
(questions and answers proposed by the professors). 
— Student area: this space allows students to register and to authenticate in the platform to pass exams. 


3.2. Mechanism for correcting the open questions 

To correct the open questions (the questions that have writing responses), the IES system makes use 
of the notion of semantic similarity; this similarity is calculated between the answer provided by the learner 
and the answer given by the teacher. The operation of the correction is done in several steps: i) the syntactic 
correctness of the answers, ii) sentence segmentation, labeling of speech parts, extraction of named entities 
using OPENNLP, iii) the elimination of stop words and the punctuation, and iv) the calculation of semantic 
similarity using the Mihalcea et al. [21] approach. 
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Figure 2. IES architecture developed 


3.2.1. Syntax correction 

The syntax correction of the sentence is a technique that allows to correct the syntax errors based on a 
corpus and a dictionary to correct a sentence syntactically, we need to determine the correction center of the 
sentence (the position of the word in the sentence that has a maximum frequency in the corpus) using (7) [28]: 


pos = {i : maxfreqw = freqw;} (7) 


with maxfreqw is the maximum frequency of the phrase words. 

The second step is the detection of the erroneous words of the sentence by doing a search and a 
comparison of the words of the sentence with the words of the dictionary, then the calculation of the distance 
between the erroneous words and the words of the dictionary, in order to recover all the words close to the 
erroneous word, by (8) and (9) [28]: 


dist (w,,w2) = |{ ci: (ciew¥ aciew¥)v(ciew¥ ncyewk*") (8) 
v(ciew¥ t! acjewk)v(cjew** acyewk*")v}| 
mots_corrects(w) = { wi: dist(w, w;) = max _dist(w) } (9) 


In (8) is adopted to calculate the distance between the erroneous word and all the words in the 
dictionary. In (9) is used to retrieve all words close to the erroneous word. With max_dist is the maximum 
distance between the wrong word and the dictionary words. 

Afterwards, the correction of the wrong word is based on the correct words of the sentence, the 
correction center and the list of words close to the wrong word. This processing applies a recursive technique 
[12] and the n-gram correction to the left and right to choose the correct word among the words of the list by 
calculating the frequency of each word in the list followed or preceded by a n correct word of the phrase [29]. 
Then the correct word is that has a non-zero frequency with a maximum n. 


correct? (W) = {w;: mMaxfrediypos = freqwijpos et w,e words_corrects(w)} (10) 

correct" (W) = {w;: maxfreqipos = freqwijnos et w;e words_corrects(w)} (11) 

If the position of the erroneous word exists after the correction center, then (10) is used. Otherwise, 
(11) is adopted. With W represents the wrong word. Finally, we can correct the sentence, using the correct 
word left or right and the recursive technique, by (12): 


correct(phrase) = {correct"(Wjcpos); correct? (Wispos)} (12) 


with Wi<pos represent the words of sentence located before the word existing in the correction center. Wi>pos 
represent sentence words located after the word existing in the correction center. 
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3.2.2. Segmentation of sentences and labeling of parts of speech 

In this step, we used natural language processing (openNLP) to segment the sentence (answer of the 
question) and to make a labeling of each word of the sentence; this is adopted to calculate the semantic 
similarity between the answer proposed by the teacher and the other given by the student, using the approach 
of Mihalcea et al. [23]. We generate the similarity matrix of Wu & Palmer words of sentences that have the 
same grammatical field to use (6). If we find a similarity greater than or equal to 0.75, we consider that the 
similar sentences, that is to say the answer given by the student is correct, so he takes the complete note of 
the question. If no, we calculate the question score (13) by multiplying the similarity of the two answers and 
the scale of the question. 


note(q) = Sim(R, Rp) * bar (13) 
Where R is the answer of the student; R, is the answer proposed by the professor; bar is the full note of the question. 


Figure 3 shows the different steps of the similarity calculation, starting with sentence cleaning, 
segmentation, generation of the similarity matrix and ending with the Mihalcea similarity calculation. 


Step 1: Elimination of« stop words » 


Step 2: Correction of errors (correction approach) 


| | 


| Result of Sentencel | | Result of Sentence2 
[ Step 3: Segmentation (OpenNLP(1)) j 


| 


Result of Sentencel 


Result of Sentence2 


Step 4: Similarity Matrix Wu&Palmer (WordNet(2)) 


sentence?\sentence1 Word: Wordi | sivesnsaescisce Wordn 
Word: | 


Wordm 


Step 5: Mihalcea similarity 


Figure 3. Flowchart that presents the semantic similarity calculation process 


— Apache openNLP: the apache OpenNLP library is a toolbox. Support of the openNLP library: i) 
tokenization, ii) segmentation of the sentence, iii) marking of part of speech, iv) named entity 
extraction, v) data chunking, vi) data analysis. 

— WordNet: is a wide-ranging lexical database, developed for over 20 years by linguists from the 
cognitive science laboratory at Princeton University for the English language. It is freely usable, even 
been created manually or automatically from, in extension to, or in addition to WordNet. Programs from 
the world of artificial intelligence have also established bridges with WordNet. It is a semantic network 
of the English language, which is based on a psychological theory of language. The first version 
released dates back to June 1991 [13]. 


Realization of an intelligent evaluation system (Otman Maarouf) 


290 o ISSN: 2252-8776 


3.2.3. Application example 

The statement of the question is as follows: “give the definition of java class”. The answer proposed 
by the professor is as follows: “a class is a definition model for objects with the same set of attributes, and the 
same set of operations”. The answer written by the student is: “a class is a model to generate and define the 
objects to have the same attributes and method”. The note reserved for this question is 3 points. The 
correction of this type of question starts with the verification of the spelling errors, the spelling errors, in this 
example we found two errors “defined” and “th”, after the syntax correction of this answer [28] we consider 
the new answer is “a class is a model to generate and define the objects for the same attributes and method”. 
Figure 4 demonstrate an example to calculate the similarity between response of teacher and student. 


Teacher Sentence: A class is a definition 
model for objects with the same set of 
attributes, and the same set of operations. 


Step 1: Elimination of« stop words » 


l Step 2: Correction of errors (correction approach) ) 
Class model generate define objects Class definition model objects 
attributes method. same set attributes operations. 


Step 3: Segmentation (OpenNLP(1)) 


Student Sentence: A class is a model to 
generate and definie th objects for have the 
same attributes and method 


Class/NN model/NN generate’ VB define/VB Class/NN definition/NN model/NN objects/NN 
objects/NN attributes/NN method/NN. same/JJ set/NN attnbutes/NN operations/NN 


| 


í Step 4: Similarity Matrix Wu&Palmer (WordNet(2)) ) 


l 


EE Definition NN Ane Objects/NN | Same/JJ | Set/NN | Attributes/NN | Operations/NN 
| o7 | o4 | - | 083 | 03 | 082 | 


Model/NN — 77 eos o 37 0. 8 0. 88 0. 84 0. 82 
Define/VB 
Seis a Gi l a oe 66 Pe 3 5 76 


| 0.23 | [0.84 | | 0.75_| | 072 | 


See e oar a et ee e o 


Step 5: Mihalcea similarity 
Sim(S1, $2) = 0.71 


Figure 4. Example of calculating the semantic similarity between two sentences sentence (student) and 
sentence (teacher) 


If we have a similarity between the proposed answer and the student’s response less than 0.25, we 
consider that the answer is false, i.e., the grade that will be given to the student for this question is “0”. 
Otherwise, if the similarity exists in [0.25, 0.75], we consider that the answer is partially correct by a percentage 
that is to say the student’s note for this question equal to “sim * note by cons if the similarity is greater than or 
equal to 0.75, we consider that the answer is correct and the student will have the whole note. The similarity 
between the proposed answer and the student’s answer in the example equals ‘0.71’, so in this case the student’s 
score is ‘0.71*3=2.13’. Figure 5 explain the process to calculate the score of the student response. 
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Sentencel Sentence2 


Similarity (Sentence1, Sentence2)=Sim 


If Sim <0.25 Elbe lt 
0.25 <Sim <0.75 
Note_F=0 Note_F=Sim*Note Note_F=Note 


Figure 5. Flow chart that explains the calculation process 


4. CONCLUSION 

In this end-of-studies project, we have developed an intelligent evaluation system (SEI), which 
makes it possible to assess learners effectively; this effectiveness lies in the diversification of the types of 
questions to be proposed in an evaluation. The particularity of this system compared to the existing one is 
that it added other types of questions, namely: i) programming issues where the learner must answer the 
question through a program using a programming language and ii) open questions where the learner has to 
write the answer of the question as a text. The realization of this system required several steps; starting with 
the study of the existing, then looking for improvement as well as resolution approaches, and finally the 
choice of tools and the development of the system. Several perspectives are conceivable, namely: i) adapt this 
evaluation system to correct all types of questions of all subjects and ii) use multilingual dictionaries and 
corpora to correct answers to questions from different languages. 
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